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1.  Introduction. 

The  following  two  problems  were  posed  in  response  to  a  question  verbally  posed  by  two 
geneticists.  It  turned  out  that  the  problems  did  not  address  their  questions  which  could  be 
answered  by  reference  to  U  statistics.  This  left  two  problems  of  some  theoretical  interest, 
but  with  no  apparent  application.  The  recall  of  previous  unpublished  work  by  Cornfield  and 
Greenhouse  (1975)  led  to  subsequent  discussions  with  S.  Greenhouse  and  J.  L.  Gastwirth 
which  suggest  potential  applications  of  these  problems  to  issues  in  discrimination. 

Consider  independent  pairs  of  independent  Bernoulli  observations  on  two  sequences  of 
probabilities  =  {pjx^  :  1  <  i  <  n)  and  p^2)  =  {p^  :  1  <  i  <  n}.  In  biostatistical 
and  discrimination  applications  one  is  often  interested  in  knowing  whether  the  pj1*  tend 
to  be  greater  than  the  p\  .  The  data  provide  only  four  useful  items  of  information  for 
this  situation  involving  2n  parameters.  These  are  the  numbers  of  pairs  which  consist  of 
(0,0),  (0,1),  (1,0)  and  (1,1)  respectively,  and  can  be  summarized  in  a  two-way  table  with 
entries  noo, ^oii  nio  and  n\\  adding  up  to  n.  In  many  such  applications  it  is  reasonable 
to  formulate  a  test  of  the  null  hypothesis  Hq  :  p^  =  P^\  1  <  i  <  n  by  postulating  that 
the  odd-ratios 


rpi  = 


pi’Va-pj11) 


>  =  1,2,. 


n 


axe  all  equal  to  a  common  value  xp  and  to  test  whether  rp  =  1.  Several  recent  examples 
are  Gastwirth  and  Greenhouse  (1987)  and  Yu  (1993). 

An  interesting  case  for  analysis  is  that  where  n0o  =  «n  =  1,000,  nj0  =  20  but 
noi  =  5.  While  the  overall  success  rates  for  both  cases  are  almost  equal,  it  is  clear  that 
the  discrepancy  between  ni0  and  n0 1  is  statistically  significant  and  could  lead  to  rejecting 
Ho  if  that  hypothesis  were  seriously  intended.  While  this  example  would  fail  to  prove  that 
one  treatment  is  much  better  than  another  (in  a  case  where  two  treatments  were  applied  to 
n  matched  pairs  of  individuals)  the  McNemar  test  would  clearly  demonstrate  that  there  is 
a  small  subpopulation  on  which  the  treatments  have  a  decidedly  different  effect.  In  other 
words,  it  would  be  evidence  of  the  presence  in  the  population  of  an  explanatory  factor 
discriminating  among  supposedly  matched  pairs,  and  which  may  or  may  not  be  important 
to  uncover. 

Another  example,  related  to  the  problems  we  shall  pose,  is  that  where  the  data  consist 
of  n0o  =  nil  =  0  and  n io  =  100  =  n0i.  In  this  case  there  is  no  indication  that  there  is 
an  overall  tendency  for  the  p-1^  to  exceed  the  p*2\  Nevertheless  in  discrimination  cases 
the  data  clearly  show  that  subjects  were  treated  differently,  depending  on  the  existence  of 
some  hidden  factors. 

The  problems  originally  posed  are  the  following. 
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Problem  1.  Assuming  that  Ho  :  p\^  =  p-2^  =  pi  for  1  <  t  <  n,  where  p  =  {p*  :  1  < 
t  <  n}  is  unspecified,  what  can  be  said  about  the  mean  and  variance  of 

D  =  (n0i  4-  nio)/n  ? 


Problem  2.  Test  the  hypothesis  Ho  using  D  as  a  test  statistic. 

Because  of  the  paucity  of  data,  it  is  unlikely  that  D  will  be  an  effective  test  statistic 
for  testing  Ho  in  most  applications.  Nevertheless,  as  the  second  example  indicates, 
there  may  be  situations  where  D  leads  to  rejecting  Ho  and  reveals  the  existence  of  an 
effective  explanatory  factor  which  may  be  worth  discovering.  We  shall  see  that  while  it 
is  impossible  to  estimate  accurately  the  variances  of  estimates  or  the  significance  levels  of 
tests,  useful  bounds  on  these  may  be  derived.  In  Section  4  we  shall  generalize  to  the  use  of 
(ai  njo  +  «oi  )/n  as  a  test  statistic.  Note  that  as  long  as  a\  and  a 2  are  positive  or 
have  the  same  sign,  the  uce  of  this  generalization  attacks  the  side  issue  of  hidden  factors 
rather  than  the  usual  issue  of  whether  the  pj1*  tend  to  exceed  the  p^\ 

In  Section  5  we  shall  consider  the  case  where  there  are  three  observations  on  p  and 
that  where  there  are  two  observations  on  p^  and  one  on  p^. 

Almost  all  of  the  derivations  will  appear  in  the  appendices  which  will  make  extensive 
use  of  the  Geometry  of  Moments  presented  in  Karlin  and  Shapley  (1953).  Certain  aspects 
of  the  geometry  of  the  space  of  (n0o,  «oi,  «io>  «n)  are  discussed  in  Fienberg  and 
Gilbert  (1970)  and  in  Diaconis  (1977).  Fienberg  and  Gilbert  discuss,  among  others,  the 
set  on  which  there  is  a  common  odds  ratio.  Diaconis  is  interested  in  aspects  relevant  to 
exchangeability. 

2.  Problem  1. 

We  may  regard  D  as  the  average  of  n  Bernoulli  random  variables  £),  where 
Di  =  1  if  the  i-th  pair  doesn’t  match,  i.e.,  the  pair  consists  of  (1,0)  or  (0,1).  Then  the 
expectation  and  variance  under  H0  are  given  by 

E0Di  =  2pi(l  -  Pi)  =  d\0) 

and 

Var0(I?i)  =  dj0)(l  —  dj0)). 

Thus 

(2.1)  A0  =  E0D  =  n-1  >  =  S( d<0))  =  2£{p(l  -p)} 

where  €  stands  for  the  average  over  the  n  subscripted  values.  Similarly 

(2.2)  al  =  nVaro(D)  =  €{S°\  1  -  d<°>)}  =  €{2p{\  -  p)(l  -  2p(l  -  p))} 
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While  D  can  be  used  as  an  estimate  of  Ao,  Cq  can  not  be  estimated  consistently  from 
the  available  data.  On  the  other  hand,  it  is  easy  to  see  that  0  <  Ao  <  2£(p)[l— £(p)]  <  1/2 
and  we  shall  show  in  Appendix  A3  that  for  a  given  Ao, 

(2.3)  A0/2  <ol<  A0(l  -  Ao) 

Note  that  the  ratio 

A0(!  -  A_q) 

A0/2  A 

ranges  from  2  to  1  as  Ao  ranges  from  0  to  1/2  and  that  the  difference 

A0(l  -  Ao)  -  A0/2  =  A0(l/2  -  A0) 

ranges  from  0  to  1/16  and  back  to  zero,  peaking  at  A0  =  1/4. 

Treating  the  Dj  as  i.i.d.  Bernoulli  random  variables  with  common  probability  A0 
would  give  the  correct  mean  for  D  but  could  possibly  overestimate  the  variance  by  a 
factor  of  2(1  —  Ao)  which  is  close  to  2  if  Ao  is  small.  That  means  that  a  naive 
confidence  interval  for  Ao  based  on  the  assumption  of  a  common  probability  would  be 
conservative,  and  possibly  by  as  much  as  a  length  factor  of  y/2. 

More  precise  bounds  are  derived  in  Appendix  A3  making  use  of  ir  =  S(p)  which  can 
also  be  estimated  from  the  data.  Using  ir  leads  to  relatively  minor  improvement  of  the 
upper  bound.  It  has  no  effect  on  the  lower  bound  in  the  triangle  of  (ir,  A0)  values  with 
vertices  (0,0),  (1,0),  and  (1/2,  1/2),  but  it  leads  to  substantial  improvement  near  the 
upper  boundary  where  Aq  =  27r(l  —  ir). 


3.  Problem  2. 

Since  Ao  =  EqD  in  Problem  1  can  range  from  0  to  1/2,  it  follows  that  D  can  be 
used  to  reject  Ho  only  if  D  is  significantly  greater  than  1/2.  However  if  £(p)  were 
known,  then  A0  =  EoD  =  2£(p(l  -  p))  <  2[£(p)  -  [£(p)]2]  and  we  could  reject  Ho  for 
values  of  D  <  1/2  provided  they  were  significantly  greater  than  2£(p)[l  —  £(p)].  Not 
knowing  £(p),  we  could  estimate  it  and  use  as  our  test  statistic 

(3.1)  T  =  D  -  2fr(l  -  ir) 

where  ir  =  (ir*1)  +  ir*2))/2,  ir*1)  =  (nio  +  nu)/re,  and  ir*2)  =  (n0i  +  nu)/n.  Under 
the  general  assumptions  where  p*1)  is  not  necessarily  equal  to  p*2),  we  define  p  = 
(p*U  -|-  p*2))/2  and  then  ir,  ir*1)  and  ir*2)  axe  estimates  of  tt  =  £(p),  tf*1)  =  £(p**)) 
and  7r*2)  =  £(p*2))  respectively. 

We  see  in  Appendix  A4  that 

(3.2)  E(T)  =  (A -2t,(1-  *)]  +  J-f  {p*»(l  -  p'1))  +  p<2>(l  -  p<2,» 
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where 


(3.3)  A  =  E(D)  =  S(d) 

and  di  =  E(Di )  =  p-^(  1  —  p\2^)  4-  p-2*(l  —  p-1^)  for  1  <  t  <  n.  The  expression  for 
E(T )  may  be  regarded  as  the  sum  of  two  terms,  the  second  of  which  is  0(n-1)  and  is 
bounded  from  above  by  1  ^ ( 1  —  tt^)  +  7r^2^(l  —  ir^)]/2n.  Under  the  hypothesis,  the 
main  term  of  E0(T)  is  Ao  -  2tt(1  -  it)  =  -2 <72  where  a2  =  £(p 2)  -  n 2. 

Neglecting  terms  of  higher  order,  the  variance  of  T  is  seen  to  be  approximated  by 
n_1r2  where 

(3.4)  r2  =  £{A n2d  -  4(2tt  -  l)pd  -  d2)  -  (2w  -  l)2£((p(1>  -  p(2))2) 

Under  the  hypothesis  Ho,  this  variance  becomes 

(3.5)  Tq  (tt ,  Ao)  =  47t2A0  -  12t r4  +  (4  -  16 it  +  247t2)(7t  -  A0/2)  -  4£(p  -  ?r)4. 

To  study  the  range  of  the  main  term  in  E(T),  A  —  2tt(1  —  tt),  we  demonstrate  in 
appendix  A5  that 

(3.6a)  0  <  |jt(1)  -  tt(2)|  <  A  <  2ir  if  0  <  tt  <  1/2 

and 

(3.66)  0<  -tt(2)|  <  A  <2(1-tt)  if  1/2  <  tt  <  1. 

and  that  these  inequalities  are  sharp  given  7rU)  and  tt^2\  Without  specifying  iri1)  and 
which  can  be  estimated  from  the  margins,  we  see  that  (it,  A)  lies  in  the  triangle 
with  vertices  (0,0),  (1/2,  1),  and  (1,0).  Under  the  hypothesis  ( tt ,  Ao)  is  restricted  to 
the  subset  of  the  triangle  under  the  parabola  Ao  =  27r(l  —  it).  Where  (tt,  A)  lies  in  the 
triangle  depends  on  the  value  of  a12  =  £(p^p<'2^)  —  n-^M2).  When  (tt,  A)  lies  above 
the  triangle,  E(T)  is  positive,  the  hypothesis  is  not  true,  and  we  will  be  able  to  reject 
Ho  with  enough  data.  If  (it,  A)  lies  below  the  parabola,  E(T)  <  0  and  the  hypothesis 
may  or  may  not  be  true,  but  we  will  not  be  able  to  use  T  to  reject  the  hypothesis.  Of 
course  other  test  statistics  could  be  effective  if  we  were  aiming  seriously  at  testing  Ho-  In 
particular  it  would  be  easy  to  detect  deviations  from 

To  maximize  r0  subject  to  given  values  of  Ao  =  £(2p(l  —  p))  and  it  =  S(p)  one 
must  minimize 

Vx=£(p-ir)\ 

It  is  easy  to  see  that  the  minimum  of  Vj,  unrestricted  by  the  condition  0  <  p  <  1,  is 
achieved  by  p  =  n  ±<rp  each  with  probability  1/2.  When  it  —  ap  <  0  or  it  +  op  >  \, 
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we  see  in  Appendix  A3  that  the  restricted  minimum  is  achieved  by  one  of  the  2  point 
distributions  assigning  probability  6  at  q  and  1—8  at  0  or  1,  for  appropriate  values 
of  q  and  9.  In  each  case  it  and  Ao  determine  q  and  6.  Let 

(3.7)  V2(7T,  <t)  =  CT2(i 76  +  7T6)/[7T2(<72  +  7T2)] 

The  minimum  of  V\  is 

(3.8a)  Vlm  =  a*  if  <jp  <  nm  =  min(7r,  1  -  v ), 

and  otherwise 

(3.86)  Vlm  =  V2(«m,<rp). 

The  maximum  of  Vi  is  also  attained  by  a  two  point  distribution  involving  0  or  1  if 
0  <  fm  <  1  /4,  and  is  then 

(3.8c)  VlM  =  V2{\  -  7rm,o-p) 

If  1/4  <  TTm  ^  1  /2  then  Vm  niay  be  Vi(l  (Tp)  or  may  be  attained  by  a  3  point 

distribution  involving  0,1,  and  q  =  med(51,g2,27r-l/2)  where  gj  =  [tt  —  £T(p2)]/(l  —  7r) 
and  q2  =  S(p2)/w. 

Then  r0  is  bounded  above  and  below  by  A0)  and  r0m( 7r,Ao)  where  these 

are  derived  from  by  replacing  £(p  —  7r)4  by  Vjm  and  Vim  respectively,  and  where 

(3.9)  Ao  =  2[tt(1  -  tt)  -  a2]  . 


For  large  n 


(3.10) 


z  =  nl/2iD  ~2^(1  ~^)1 

To(7T,L>) 


should  be  approximately  normally  distributed  with  mean  less  than  or  equal  to  0  and 

variance  1  when  the  hypothesis  Ho  is  true.  The  expectation  of  T  and  the  bounds  on 

To  provide  corresponding  approximate  bounds  to  the  probability  of  rejection,  when  the 

hypothesis  is  true,  for  a  test  using  T  as  the  test  statistic. 

For  a  given  joint  distribution  for  (p^,p[2))  it  is  possible  to  calculate  E(T)  and  r 

and  to  estimate  the  corresponding  noncentrality  parameter  and  the  power  of  the  test  of  Ho- 

For  illustrative  and  computational  purposes  a  mixture  of  independent  beta  distributions  of 

k 

the  form  7(51,92)  =  ^^'^iBe(q\;au, 0u)Be(q2-,a2i, 02i)  might  be  suitable.  To  calculate 

i=i 

bounds  on  the  power  function  of  the  test  without  assuming  a  proposed  distribution,  we 
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should  calculate  bounds  on  r2  forgiven  7r^\rr*2^  and  A,  which  may  be  estimated  from 
the  data.  Of  course  if  and  tt*2*  are  not  close,  their  estimates  would  clearly  indicate 
that  Hq  is  not  true.  But  our  use  of  Z  is  directed  more  at  detecting  hidden  explanatory 
factors  than  at  testing  the  validity  of  H0.  In  any  case,  bounding  r2  involves  minimizing 
and  maximizing  the  variance  of  ( p ^  —  ?r^)(p*2*  —  7r^2^)  subject  to  specified  values  of 
^1)^(2)  £^p(i)p(2)y  This  problem  is  discussed  in  Appendix  6. 

4.  Generalization  of  T. 

The  test  statistic  T  treats  the  pair  (1,0)  the  same  as  (0,1).  To  direct  the  test 
toward  detecting  specific  alternatives  where  one  of  these  pairs  is  more  likely  to  occur  than 
another,  we  may  apply  the  test  statistic 

(4.1)  Tj  =  (ajnio  +  a2n0i)/n  -  (a!  +  a2)7r(l  -  it) 

Then,  we  see  in  Appendix  A4,  that 


ET\  =  -(ai  +  a2)cr12  +  °----(ff(1)  -  tt(2))  +  ^  - 2  (tt(1)  -  7r(2))2 

(4.2)  +  ^-t-^5[p(1)(l  -p(,))+p(2)(l  -p(2))] 

4  n 

where 

(4.3)  <Ti2  =  £{p^p^)  —  7T^7r^  =  —  7T^)  +  7T^2^(1  —  7T*1*)  —  A] 

Also  Var  (Tj)  =  n-1T2  plus  higher  order  terms  where 

t2  =  f|62p(1)(l  -p(1))  -f  6jp(2)(l  -p(2))  +  (ai  +  a2)2p(1)p(2)(l  -p(1)p(2)) 

(4.4)  —  2(aj  -f  a2)p(1,p(2)[6i(l  -p(1))  +  62(l  -p(2))]j 
where 

(4.5a)  6X  =  (a;  4-  a2)7r  +  (aj  —  a2)/ 2 

and 

(4.56)  62  =  (aj  +  a2)?r  -  (aj  -  a2)/ 2. 

Incidentally  6j  +  62  =  2?r(ai  +  a2),  6j— 62  =  aj— a2,  6162  =  7r2(ai  +  a2)2  —  [(ai  —  a2)/2]2. 


6 


5.  Multiple  Observations. 

The  dimculties  in  bounding  the  basic  parameters  in  the  inferences  in  our  problems 
are  mitigated  when  more  observations  axe  available  on  each  p*. 

5.1  Problem  1  With  3  Observations. 

Suppose  fbat  in  Problem  1,  we  had  3  observations  leading  to  the  data  no,nj,n2,n3 
where  rij  is  the  number  of  i  values  (trials)  for  which  we  observe  j  successes.  Then 

(5.1)  £o(n;/n)=  ^£{^(1  -  p)3"2},  j  =  0,1, 2, 3 

and  we  can  estimate  £(p),  £(p2)  and  £(p3),  since 

•Eo[(3n3  +  2n2  4-  ni)/3n]  =  £(p) 

-Solans  4-  n2)/3n]  =  £(p 2) 

(5.2)  E0[n3/n]  =  £(p3) 

Thus  we  may  estimate  A0  =  2£{p(l  —  p)}  by 

(5.3)  D  =  2(ni  4-  n2)/3n 
for  which 

(5.4)  al  =  nVar0(£>)  =  £{4p(l  -  p)[l  -  3p(l  -  p)]/3}. 

To  bound  5q  using  our  estimates,  we  need  to  bound  £(p4)  given  £(p),  £(p2)  and 
£(p3).  That  problem  is  addressed  in  Appendix  A7. 

With  4  observations  on  p,  we  will  have  estimates  of  £(p),  £(p2),  £(p3)  and  £(p4) 
and  the  variance  of  the  natural  estimate  of  Ao  may  be  estimated  consistently  form  the 
available  data. 

5.2  Multiple  Observations  for  Problem  2. 

Suppose  that  for  the  test  of  H0  we  have  two  observations  on  p^  and  one  for  p*2*. 
We  may  label  our  observations  by  n;*  =  number  of  trials  with  j  successes  on  p^  and 
k  successes  on  p\  '  with  j  =0,1,2  and  k  =  0,1. 

One  test  statistics  that  may  be  used  would  be  based  on  (n2o  4-  noi)/n  which  has 
expectation  A30  =  £(p  -  p3)  under  the  hypothesis  Ho.  From  the  observations  on 
p^1)  we  can  estimate  £(p(1*)  and  £(p*1;2).  From  those  on  p(2^  we  can  estimate 
£(p*2*).  Given  £(p)  and  £(p2),  the  bounds  on  A30  are  derived  in  Appendix  A8.  In 
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particular  the  maximum  value  is  it  —  \£{p2)}2 /i r  which  may  be  estimated  by  substituting 
ir  =  (2n2o  +  «io  +  «oi)/3n  for  it  and  (n20/n)  for  S(p2).  Thus  a  natural  test  statistic 
is 


T,  = 


(5.5) 


—  |  (^20  +  noi)  —  [ 

1  (  n2o  —  n io  -f  2noi 
~  n\  3 


2n2o  +  nip  +  not 


2n20  +  n10  -  n0i  / 


3  Tin, 


20 


2n2o  +  «io  4-  n0i 


) 


We  shall  not  elaborate  on  bounds  on  the  variance  of  T2  here.  In  a  personal  commu¬ 
nication,  K.F.  Yu  has  pointed  out  that  with  2  observations  on  each  of  and  p*2\  the 
statistic  (no2  +  n2o)  —  «n/2  has  mean  0  and  variance  estimated  by  no2  4-  n20  4-  nn/ 4 
under  the  hypothesis.  Thus  bounds  are  no  longer  required. 


Appendix. 

The  following  remarks  represent  a  brief  summary  of  the  Geometry  of  Moments  which 
is  a  major  tool  in  deriving  many  of  the  bounds  in  this  appendix.  Let  h(X)  be  a  k- 
dimensional  vector  valued  function  of  a  random  variable  X.  As  the  distribution  F  of 
X  varies  over  a  convex  set  of  distributions,  the  range  of  Fh(X)  is  a  convex  set.  If  the 
class  of  distributions  is  the  set  of  all  distributions  over  a  closed  bounded  interval  I  and  h 
is  continuous  on  /,  the  range  of  Eh(X)  is  the  convex  set  generated  by  {h(x)  :  x  6  I) 
and  is  closed  and  bounded. 

To  maximize  one  coordinate  of  ■Eh(-X’)  when  the  others  are  specified  involves  a 
boundary  point  of  the  convex  set  which  can  be  represented  in  terms  of  a  k  point  distri¬ 
bution  (involving  at  most  k  points  of  I).  Moreover  there  is  a  supporting  hyperplane  at 
this  boundary  point  which  maximizes  some  linear  function  £?a-rh(AT),  and  every  one  of 
the  k  or  fewer  points  of  I  maximizes  aTh(i)  for  x  £  /.  Finally  the  coefficient  of  the 
coordinate  being  maximized  can  be  taken  to  be  one  if  the  specified  expectations  lie  in  an 
interior  point  of  their  k  —  1  dimensional  convex  range. 

As  a  simple  example  consider  the  range  of  (fJ(A'),  E(X2))  over  the  class  of  all 
distributions  on  [0,1].  This  is  the  convex  set  generated  by  A  =  {(ii,x2):  x2=x2,  0< 
xi  <  1}  and  is  the  set  bounded  by  A  and  B  =  {(xi,x2)  :  x2  =  xi,  0  <  xj  <  1).  It 
follows  that,  subject  to  EX  =  p,  p2  <  EX2  <  p.  Moreover,  it  is  clear  that  these  bounds 
may  be  achieved  by  the  one  point  distribution  at  p  and  a  two  point  distribution  which 
gives  probability  p  to  1  and  1  —  p  to  0. 

Given  any  point  for  which  p2  <  p2  <  p,  the  class  of  distributions  for  which  EX  =  p 
and  EX2  =  p2  must  have  support  on  a  subset  of  A,  the  convex  hull  of  which  contains 
(p,p2).  It  follows  that  there  are  two  points  q\  and  q2  in  [0,1]  such  that  no  distribution 
for  which  P{X  >  <?i)  =  1  or  for  which  P{X  <  q2)  =  1  will  yield  the  given  values  of 
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(EX,  EX2).  These  two  points  are  obtained  by  observing  where  the  lines  for  (1,1)  and 
(0,0)  through  (/x,p2)  intersect  the  generating  parabola  segment  A.  Thus,  for  0  <  \i  <  1 


(4.1) 

=  Cm  —  /4)/(i  -  m) 

and 

(4.2) 

92  =  Pa//*- 

Al.  Special  2  Point  and  3  Point  Distributions  with  Specified  Mean  and 
Variance. 

We  will  have  occasion  to  consider  several  special  two  point  and  3  point  distribu¬ 
tions.  First  we  consider  the  two  point  distribution  on  0  and  q  with  specified  values  of 
(EX,  EX2)  =  (p,/4)  where  0  <  p  <  1,  0  <  g  <  1,  and  q  is  assigned  probability  6. 
Then  6q  =  p  and  6q2  =  p2  and 

(Al.l)  q  =  p2/p  =  q2  and  6  =  p2/p2  • 

Incidentally,  for  this  distribution 

(,41.2a)  (4  =  EX3  = 

and 

(41.26)  S,o  =  EX*  =  (ntf/n1 

Also, 

(A1.3a)  p3o  =  E(X  -  p f  =  j(c2  -  p2) 

and 

2  6  6 

(41.36)  (Ho  =  E(X  -(.)*  =  =  V,^, °) 

where  a2  =  p2  =  p2  —  p2  is  the  variance  of  X. 

Next  we  consider  the  two  point  distribution  on  1  and  q  with  specified  values  of 
(EX,  EX2)  =  (p,p2)  where  0  <  p  <  1  and  0  <  q  <  1  and  q  is  assigned  probability 
6.  Then,  consideration  of  the  transformation  Y  =  1  —  X,  yields 

(A1.4)  q  =  (p  -  p2)/(  1  -  p)  =  qi  and  9  =  (1  -  p)2/[(l  “  M)J  +  <^2]» 
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and 


(A1.5a)  M3i  =  E(X  -  =  ^^(<t2  ~  U  -  A*)2) 

and 

(A1.56)  fm  =  E(X  -  fx)A  =  V2(l  -  fx,a) 

A  more  general  two  point  distribution  with  specified  (fi,fx2)  will  assign  probability 
0  to  p  +  r(l  —  0)  and  1  —  0  to  /x  —  r9  for  r  >  0,  and  0  <  9  <  1.  For  this  distribution 
r  and  9  are  connected  by 

(A1.6)  a2  =  r20(l  -  9). 

If  we  drop  the  restriction  that  fx  —  r9  and  }i  +  r(l  —  9)  be  in  the  interval  [0,1]  we  have 
6  =  1/2  when  r  =  2a.  Then  we  will  have  use  for  dr/d9  and  d(r9)/d9.  It  is  easy  to  see 
that  dr/d9  =  r(0  —  1/2) 

(-41.7)  d(r9)/d9  =  r(l  —  9/2  +  02)  >  0. 

and 

(A1.8)  d(r(l  -  9))/d9  =  r(- 3/2  +  30/2  -  02)  <  0. 

Finally,  consider  the  3  point  distribution  which  assigns  probability  <f>  to  1,  0  to  q 
and  1  —  9  —  <j>  to  0  where  0  <  q  <  1.  For  the  convex  hull  of  (0,0),  (1,1)  and  ( q,q 2) 
to  contain  (/i,  n2),  where  0  <  /r2  <  /i'2  <  n  <  1,  we  must  have  qi  <  q  <  q2.  Then  it  is 
easy  to  derive  q  =  (/ x2  —  ^)/(/z  —  ^),  and 

(-41.9) 

1-9 

and 

(-41.10)  0  =  (M  -  -  9) 

A2.  Bounds  on  £(X  -  n)A  and  E{X-\/2)A. 

We  derive  upper  and  lower  bounds  on  m  =  2*?(J£  —  n)A  and  E{X  —  1/2)4  subject 
to  P{0  <  AT  <  1}  =  1  and  specified  values  of  /i  and  =  <r3  -f  /u3.  The  trivial  cases 
where  /*2  =  fx2  and  fx2  =  fx  are  bypassed. 
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Since  p4  =  E(X  —  p)4  =  EX 4  —  4 fxEX3  -4-  6p2p 2  —  3p4,  we  may  consider  optimizing 
E(X 4  —  4 fiX3).  The  function  g(x)  =  x4  —  4px3  —  Aix  —  A2x2  has  at  most  one  local 
maximum  and  two  local  minima.  It  follows  that  the  maximum  of  g(x)  over  [0,1]  can  be 
attained  on  at  most  3  points,  of  which  only  one  can  be  an  interior  point.  The  minimum 
can  be  attained  on  at  most  two  points. 

To  attack  the  maximization  problem,  we  first  apply  the  3  point  distribution  of  Ap¬ 
pendix  Al,  and 

E(X4  -  4 (xX3)  =  <f>{  1  -  4n)  +  9(q 4  -  4pg3) 

=  p(  1  -  4p)  -  (p  -  p2)[(l  “  4p)(l  +  g)  +  q2] 

which  attains  its  maximum  at  q  =  2p  —  1/2.  But  we  are  restricted  to  q\  <  q  <  g2  by 
the  argument  in  Al.  Hence  the  restricted  maximum  of  E(X  —  p)4  occurs  when 

( A2.1)  q  =  g0  =  med(§i ,  q2 , 2p  -  1/2). 

This  implies  that  we  have  a  2  point  distribution  when  2p— 1/2  <  qi  or  when  2/i— 1/2  >  ga¬ 
in  particular,  whenever  p  <  1/4,  we  have  a  2  point  distribution.  For  1/4  <  /1  <  3/4,  we 
may  have  a  2  or  3  point  distribution  depending  on  the  value  of  fi'2. 

To  maximize  E(X  —  1/2)4  =  E(X4  —  2X3)  +  3fi2/2  —  p/2  -f  1/16,  we  again  apply 
the  3  point  distribution  to 

E(X4  -  2X3)  =  -<f>  +  6{q4  -  2q3) 

= -p  +  Cp-paX1 +9~92) 

which  is  maximized  at  g  =  1/2.  Thus  the  restricted  maximum  occurs  when 

(42.2)  q  =  qo  =  med(g! ,  g2 , 1/2). 

For  the  minimization  problem  for  E(X  —  p )4,  consider  first  the  2  point  distribution 
which  minimizes  E(X  —  p)4  without  the  restriction  of  X  to  [0,1].  That  is  clearly  the 
distribution  which  attaches  probability  1/2  to  each  of  p  ±  a  and  yields  the  value  a4. 
If  0<p  —  <r  <  p  +  <r  <  1,  this  distribution  solves  the  restricted  minimization  problem. 

Since  <72  =  pj  —  p2  <  p  —  p2  <  1/4,  p  —  a  <  0  implies  p  <  a  <  1/2.  Similarly 
p  +  a  >  1  implies  p  >  1/2.  If  p  —  <7  <  0,  we  refer  to  the  two  point  distribution  of  Al 
at  fi—r6  and  p  +  r(l  —  0).  Then 

„,(«)  =  E(X  -  =  r*9(l  -  9)[1  -  3«(1  -  «)]  =  [^yr«j  -  3  ' 

Since  d(r0)/d0  >  0,  it  follows  that  as  p  —  r0  increases  from  p  —  a  where  9  =  1/2,  6 
decreases  and  vi(0)  increases.  Thus  the  minimum  value  of  vi(9),  subject  to  the 
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restrictions,  occurs  when  y  —  rO  =  0,  i.e.,  for  the  2  point  distribution  at  0  and  q2 
and  the  minimum  values  of  E(X  —  y)4  is 


(A2.3)  V2(y,  a)  =  a2(ft6  +  a6)/ft2(ft2  +  a2). 

A  symmetric  argument  for  the  case  y  +  a  >  1  yields  the  two  point  distribution  at 
and  1  with  the  minimum  value  of  V2(l  —  ft,  a). 

The  minimization  problem  for  E( X  — 1/2)4  is  somewhat  more  complicated.  We  note 
that  E(X  -  1/2)4  =  E[(X  -  y)4  +  4(/z  -  1/2)(X  -  ft)3]  +  6(y  -  1/2 )2<r2  +  (y-  1/2)4  and 
that  it  suffices  to  minimize 

u2  =  E{(X  -  y)4  +  4(ft  -  1/2)(^T  -  ft)3} 

=  a -2[(r2  -  3a2)  +  (2 y.  -  l)r(l  -  29)] 


subject  to  the  restrictions. 

Suppose  that  0  <  fx  <  1/2.  Since  r  takes  on  the  same  value  for  6  and  1  —  0,  it 
is  clear  that  the  minimizing  value  of  9  will  be  less  than  1/2.  Ignoring  the  restriction 
0  <  (x  —  r9  <  fx  +  r(l  —  9)  <  1,  we  have 

(A2A)  ^  =  r^(2«  -  l)[r  -  r„(0)] 

where 

(A2.5)  r0(9)  =  (2 fx  -  1)[(0  -  1/2)  +  (9  -  1/2)"1]. 

Then,  as  9  goes  from  0  to  1  /2,  r  decreases  form  oo  to  2a,  9r  increases  from  0  to  a 
and  ro(0)  increases  from  5(1/2  —  ft)  >  0  to  oo.  Thus,  there  is  a  unique  value  of  90  of 
9  for  which  r<>(0o)  =  r  and  0o  <  1/2. 

If  0  <  fx  —  90ro(9o)  <  fi  +  (1  -  90)ro(9o)  <  1,  90  and  r0(90)  define  the  minimizing 
two  point  distribution.  If  /x  —  0o^o(0o)  <  0,  we  see  that  as  9  decreases  from  0o,  t9 
decreases  and  y.  —  r9  increases.  At  the  same  time  r  increases  and  r0(9)  decreases. 
Thus  dv2/d9  <  0  and  v2  increases.  Then  the  minimizing  value  of  v2  subject  to  the 
restrictions  will  occur  when  fx  —  r9  =  0,  i.e.,  for  the  two  point  distribution  at  0  and  q2. 

If  0  <  ft  —  0oro (0O )  <  1  <  y  +  (1  —  0o)ro(0o),  then  as  9  increases  from  90  toward 
1/2,  r(l  — 0)  decreases,  r  decreases,  ro(0)  increases,  and  hence  v2  increases.  Then  the 
minimizing  value,  as  long  as  0  <  1/2,  occurs  at  the  two  point  distribution  at  1  and  q\. 
Since  we  showed  above  that  the  minimizing  value  of  0  subject  to  the  restrictions  is  less 
than  1/2  we  have  demonstrated,  for  ft  <  1/2,  that  the  minimizing  distribution  is  one  of 
three  two  point  distributions  depending  on  0o  and  ro(0o). 
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The  case  of  p  >  1/2  follows  by  symmetry.  When  ft  =  1/2  we  have  <r  <  1/2  and 
the  two  point  distribution  on  ft  ±  a  is  the  minimizing  distribution. 

A3.  Bounds  on  a\. 

Since  Ao  =  £(d*°))  and  <Jq  =  £{d(°)(l  —  d^)}  and  dj0^  =  2p<(l—  pi)  can  vary  from 
0  to  1/2,  the  range  of  (Ao,<Jo)  is  the  convex  hull  of  A  =  {(z,  x(l  —  z))  :  0  <  x  <  1/2}. 
That  convex  set  is  bounded  by  A  and  B ,  the  straight  line  segment  from  (0,0)  to 
(1/2,  1/4).  Thus  A  and  B  determine  the  upper  and  lower  bounds  of  <7q  for  given  Ao 
and  indicate  how  they  may  be  achieved. 

The  lower  bound  is  attained  when  some  of  the  pi  are  1/2  and  all  the  others  are  0 
or  1.  The  upper  bound  is  attained  when  all  the  d^  are  equal  to  Ao-  Except  when 
A0  =  1/2,  there  are  2  possible  values  of  p,  which  give  the  same  value  of  dj°^  =  A0. 

The  bounds  can  be  refined  if  we  are  given  Aq  and  n.  Then  our  problem  becomes 
that  of  minimizing  and  maximizing  £{2p(l  —  p)[l  —  2p(l  —  p)]}  subject  to  specified  values 
of  £(p)  and  £(p2).  But  that  reduces  to  maximizing  and  minimizing  £(p4  —  2p3)  or 
£(p  —  1/2)4.  That  problem  is  treated  in  A2. 


A4.  Mean  and  Variance  of  T. 

Let  Ti  —  (ainjo  +  O2n0i)/n  —  (aj  +  a2)n(l  —  it).  Section  3  deals  with  the  case 
where  ax  =  a2  =  1.  We  represent  the  outcome  for  the  i-th  pair  by  (X-X\  Xf2>)  and 
by  (X0o«, -X’lo^Xoii, Xu*)  where  Xjk,  =  1  if  the  outcome  is  (j,  k)  and  0  otherwise. 
Then  Xj 1}  =  jr10i  +  X1U,  X(2)  =  X01<  +  X1U  , 

n  n 

*1  =  n-1  ^(X10l  4-  Xui)  =  n-1  £  xl1]  =  (n10  +  n„)/n 

1=1  i=i 

and 

n  n 

*2  =  n  1  =  n~r  ^3 ~  (no*  nn)/n 

1=1  1=1 

Let  c,  =  {X\l)  +  X(2))/2  -  Pi.  Then 

*2  =  (*  +  i£  l)  2  =  -2  +  2nn~1  ±ei+n~'(±  * 

'  n  1=1  '  i=l  '  «=1  ' 

and 


(A4.1) 


T\  —  (ai  +  0,2  ){*2  —  ir)  +  n  1  ^3  [al*^10i  +  O2X0U  —  (°1  +  °2)(1  ~  2ir  )Cj 

1=1  L 

+  (ai  +  Q2)rc~2fy^»l 

'i=i  ' 
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where  the  last  term  is  Oj^n-1)  and  has  mean  (ai  +  a2)£[p^(l  —  p^)+p^\l  —  p*2*)]/4n 
and  the  second  term  is  £[aip^(l  —  p^)  +  a2p^(l  —  P^)]  +  0j>(n-1^2)  and  has  the  same 
variance  as  n-1£Si  where 

^  =  axXm  +  a2X01i  -  {a,  +  a2)(l  -  2*)(X\1)  +  x\2))/2 


Thus 


Hi  =  <«,  +  <*){-*„  +  +  a'-a2  ^ 


ai  +  a2 


(A4.2) 


+ 


£[p(1)(  1  -  P(1))  +  p(2)(l  -  p(2))]/4n| 


where  <Tj2  =  —  Tr*1^2*. 

Now  we  rewrite 

5i  =  61x|1)  +  62X1(2)-(a1+a2)^11i 

where  b\  —  (ai  +  a2)7r  +  (ax  —  a2)/2  and  &2  =  («i  +  a2)ir  —  (ai  —  a2)/2,  and  we  observe 
that  Cov(X;,>,Jflli)  =  pl1)pl2)(l-p<1))  and  Cov(J?P\  X,,.)  =  pSl)pS2,(l It 
follows  that,  neglecting  the  Op(n-1)  term  of  Ti,  VarTx  sw  n-1r2  where 


Ti  ~  s|fc?P(1)(l  ~P(1))  +  fc2p(2)(  1  -  p(2))  +  (a,  +  a2)2p(1)p(2)(!  -p(1)p(2)) 

(A4.3)  -  2(ai  +  a2)p(1)p(2)[fe1(l  -p(1))  +  ^(1  -p(2))]j 

To  derive  Equation  (3.2)  we  set  aj  =  a2  =  1  in  (A4.2)  and  note  that 

A  —  27r(l  —  7r)  =  S(p(1)  +  p ^  —  2  p^1  V^)  —  (tt^  +  tt^)(1  —  — — — ) 

=  ir^  +  n ^  —  2<7i2  —  2  ir^ir^  —  +  (tt*1^  +  ir^)2/ 2 

=  —  2<Ti2  —  (iT^  —  7r<2>)2/2 

To  derive  Equation  (3.4),  we  set  dj  =  a2  =  1  in  (A4.3).  Then  61=^  =  2?r.  The 
matching  of  coefficients  of  47r2 ,  Air  and  1  in  these  two  disparate  forms  involves  showing 
that 


p0)(1  _  p(D)  +  p(«(i  -  pW)  =  d-  (p*«  -  p<2>)2 
— 2p<1V2>(2-p<1>  -p(2>)  =  -2pd  +  (p<‘>  -p<2>)2 

4p<1>p<2)(l  -  p<»p<2))  =  4 ptf-  d2  -  (p<»  -p<2>)2 
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and  this  may  be  facilitated  by  noticing  that  p*1*  +  =  2 p,  d  =  2p  —  2p(l*p^  and 

(p(l)_p(2))2=4p2_4p(l)p(2) 


A5.  Bounds  on  A  and  EXY . 

First  we  consider  upper  and  lower  bounds  on  E(XY)  subject  to  the  restrictions  that 
EX  =  p,  EY  =  v,  and  0  <  X  <  1  and  0  <  Y  <  1  with  probability  one.  We  have 
0  <  EXY  <  EX  =  p.  Similarly  0  <  EXY  <  v.  Moreover  J57(l  -  X)(l  -  Y)  >  0  and 
hence  EXY  >  p  +  v  —  1.  Thus 

(A5.1)  max(0,  EX  +  EY  -  1)  <  E(XY)  <  min  (EX,  EY). 

Moreover  these  bounds  are  easily  attained  using  2  point  distributions  on  adjacent  edges  of 
the  unit  square.  For  example  if  EX  <  EY,  the  distribution  which  assigns  probability  v 
to  (p/i/,  1)  and  (1  —  v)  to  (0,0)  yields  EXY  =  p.  If  p  +  u  >  1,  the  distribution  which 
assigns  probability  (1  —  p)  to  (0,1)  and  p  to  (1,  (p  +  v  —  l)/p)  yields  EXY  =  p  +  i/  —  1. 
To  consider  A  we  note  that  given  tX1)  and  tX2^  with  7X1)  <  7X2),  it  follows  that 

0  <  £p(1)p(2)  <  ir(1)  if  0  <  7T  <  1/2 

and 

2tt  —  1  <  £p(1)p(2)  <  if  1/2  <  7r  <  1. 

Since  A  =  £(p^  +  p*2^  —  2p^p^)  =  2(ir  —  Sp^pW)  it  follows 

(A5.2o)  2tt  >  A  >  |?r(1)  -  tt(2>|  >0  if  0  <  n  <  1/2 

and 

(A5.26)  2(1 -tt)  >  A  >  |ir(1) -tt(2)|  >  0  if  1/2  <  tt  <  1 


A6.  Bounds  on  the  Variance  of  (pW  —  ?r^)(p*2*  —  n^). 

The  problem  of  establishing  bounds  on  £{(p^  —  ir^1^)2(p^2^  —  tX2^)2}  subject  to 
specified  values  of  and  cr12  may  be  rephrased  as  that  of  minimizing  and 

maximizing  EX2Y2  or  the  variance  of  XY  subject  to  the  restrictions  EX  =  EY  = 
0,  EXY  —  c,  and  ( X ,  Y)  €  R  =  {(x,  y)  :  —a  <  x  <  1  —  a,  — /?  <  y  <  1  —  0}  where  or 
and  representing  7r(1^  and  ir^2\  are  between  0  and  1.  Applying  A5  we  see  that 

(A6.1)  — c2  =  -  min(a/9,  (1  —  a)(l  -  /?))  <  c  <  min(o(l  —  /?),  (3(1  —  a))  =  °i 
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This  result  can  also  be  derived  using  the  Geometry  of  Moments  by  studying  where  xy  — 
Ajar  —  A2J/  is  minimized  and  maximized. 

It  is  possible  to  demonstrate  that  the  maximum  is  attained  by  a  three  point  distribu¬ 
tion,  with  two  of  the  points  on  opposite  vertices  of  R. 

The  minimization  problem  reduces  to  two  cases.  The  easier  case  is  that  where  — Cj  < 
c  <  C2.  In  that  case  the  two  branches  of  the  hyperbola  xy  =  c  have  points  in  R  and  it 
is  possible  to  find  a  two  point  distribution  for  which  Var(XK)  =  0. 

For  some  values  of  a  and  /?,  it  is  possible  to  find  values  of  c  where  C2  <  c  <  cj . 
In  those  cases  we  can  show  that  there  is  a  solution  involving  at  most  4  points,  only  one  of 
which  can  be  an  interior  point.  The  conjecture  that  there  is  a  two  point  solution  consisting 
of  a  vertex  and  another  point  (on  the  line  from  the  vertex  through  the  origin)  is  supported 
by  numerical  calculations. 

A7.  Bounds  on  5q. 

Since  do  =  £{(4p  —  16p2  +  24p3  —  12p4)/3}  ,  minimizing  and  maximizing  d„  subject 
to  specified  values  of  S(p),  £(p 2)  and  £(p3)  is  equivalent  to  maximizing  and  minimizing 
EX4  subject  to  the  specified  values  of  the  first  3  moments  and  0  <  X  <  1.  As 
in  Appendix  A2,  maximizing  EX 4  involves  at  most  a  3  point  distribution,  only  one 
point  of  which  is  an  interior  point  of  [0,lj  and  minimizing  EX4  involves  at  most  a  2 
point  distribution.  The  three  moments  uniquely  specify  such  distributions  which  may  be 
calculated  directly. 

A8.  Bounds  on  A30. 

We  wish  to  minimize  and  maximize  E(X  —  X 3)  subject  to  specified  values  of  EX 
and  EX 2  and  0  <  X  <  1.  This  is  equivalent  to  maximizing  and  minimizing  EX3  or 
P3  =  E(X  —  p)3.  The  function  g(x)  =  i3  +  Aix  +  A212  has  at  most  one  local  minimum 
and  one  local  maximum.  It  follows  that  both  the  minimum  and  maximum  of  g  on  [0,1] 
can  involve  at  most  two  points,  only  one  of  which  can  be  an  interior  point  of  [0,1].  In 
the  maximization  case  the  boundary  point  has  to  be  1,  and  in  the  minimization  case  it  is 
zero.  Thus  the  minimum  and  maximum  of  P3  are  ps0  and  pzi  of  Appendix  Al.  In 
particular  the  maximum  of  E(X  —  X3)  is  p-(p'2)2/p. 
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